An Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures

نویسندگان

Mohammadreza Mohaghegh Neyshabouri

Kaan Gokcesu

Huseyin Ozkan

Suleyman S. Kozat

چکیده

We investigate the contextual multi-armed bandit problem in an adversarial setting and introduce an online algorithm that asymptotically achieves the performance of the best contextual bandit arm selection strategy under certain conditions. We show that our algorithm is highly efficient and provides significantly improved performance with a guaranteed performance upper bound in a strong mathematical sense. We have no statistical assumptions on the context vectors and the loss of the bandit arms, hence our results are guaranteed to hold even in adversarial environments. We use a tree notion in order to partition the space of context vectors in a nested structure. Using this tree, we construct a large class of context dependent bandit arm selection strategies and adaptively combine them to achieve the performance of the best strategy. We use the hierarchical nature of introduced tree to implement this combination with a significantly low computational complexity, thus our algorithm can be efficiently used in applications involving big data. Through extensive set of experiments involving synthetic and real data, we demonstrate significant performance gains achieved by the proposed algorithm with respect to the state-of-the-art adversarial bandit algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Avoiding the Exploration-Exploitation Tradeoff in Contextual Bandits

The contextual bandit literature has traditionally focused on algorithms that address the explorationexploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy algorithms are desirable in many practical settings where exploration may be prohibitively costly or unethical (e.g., cli...

متن کامل

Efficient Beam Alignment in Millimeter Wave Systems Using Contextual Bandits

In this paper, we investigate the problem of beam alignment in millimeter wave (mmWave) systems, and design an optimal algorithm to reduce the overhead. Specifically, due to directional communications, the transmitter and receiver beams need to be aligned, which incurs high delay overhead since without a priori knowledge of the transmitter/receiver location, the search space spans the entire an...

متن کامل

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

We study theK-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. The hardness of recommending Copeland winners, the arms that beat the greatest number of other arms, is characterized by deriving an asymptotic regret bound. We propose Copeland Winners Relative Minimum Empirical Divergence (C...

متن کامل

An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem

We present an asymptotically optimal algorithm for the max variant of the k-armed bandit problem. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the expected maximum payoff received over a series of n trials. Subject to certain distributional assumptions, we show thatO ( ln( δ ) ln(n) 2 2...

متن کامل

Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms

We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitz bandits, we derive asymptotic problem specific lower bounds for the regret satisfied by any algorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitz structure of t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

An Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures

نویسندگان

چکیده

منابع مشابه

Avoiding the Exploration-Exploitation Tradeoff in Contextual Bandits

Efficient Beam Alignment in Millimeter Wave Systems Using Contextual Bandits

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem

Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms

عنوان ژورنال:

اشتراک گذاری